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Abstract: Perceptual information processing systems, both biological and non- biological, 
often consist of very elaborate algorithms designed to extract certain features or events from 
the input sensory array. Such features in vision range from simple "on-off" units to "hand" 
or "face" detectors, and are now almost countless, so many having already been discovered 
or in use with no obvious limit in sight. Here we attempt to place some bounds upon just 
what features are worth computing. Previously, others have proposed that useful features 
reflect "non-accidental" or "suspicious" configurations that are especially informative yet 
typical of the world (such as two parallel lines). Using a Bayesian framework, we show how 
these intuitions can be made more precise, and in the process show that useful feature- 
based inferences are highly dependent upon the context in which a feature is observed. 
For example, an inference supported by a feature at an early stage of processing when 
the context is relatively open may be nonsense in a more specific context provided by 
subsequent "higher-level" processing. Therefore, specification for a "good feature" requires 
a specification of the model class that sets the current context. We propose a general form 
for the structure of a model class, and use this structure as a basis for enumerating and 
evaluating appropriate "good features". Our conclusion is that one's cognitive capacities 
and goals are as important a part of "good features" as are the regularities of the world. 
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Figure 1 Typical features proposed by machine vision, neurophysiology, and ethology. 
What common properties do these features satisfy? What makes one feature better than 
another? 



In contrast, consider configurations of features that exhibit very special relations to 
one another, such as two line segments which intersect to form a "T" or a "V", or two 
line segments that are collinear. As noted by many (Barlow, 1985; Binford, 1981; Lowe, 
1985), intuitively, such coincidences imply very special "suspicious" and informative events. 
Surprisingly, however, in an unrestricted context, such as a world where sticks are posi- 
tioned arbitrarily, the observation of a "non-accidental" feature typically does not imply 
the intended world property. Again, context plays a crucial role, as illustrated in Figure 2 
for the T-junction, which can arise in many different ways. To correct this situation, the 
corresponding world event must express a generic regularity in that context (Bennett et 
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Figure 2 If the image primitives are contours (such as zero crossings), then features 
typically can be created in many ways. For example, the T-junction may arise either 
from an occlusion or from an actual T-vertex in 3D. Hence the interpretation associated 
with a feature depends strongly on the context. Alternate contexts can reverse the 
interpretation. For example, consider the peanut shape as a wire frame, or the bottom 
right figure as the view of a crack through a polygonal hole. 

al., 1989; Marr, 1970; Reuman k Hoffman, 1986; Witkin & Tennenbaum, 1983). Our task 
here is to make note of such conditions needed to support our intuitive notions of what 
'makes a good feature'. In the process, we will place a measure on just how "good" a 
particular feature is for inferencing, and show that such measures depend upon the current 
conceptualization of the world. 



2.0 Bayesian Framework 

To explore conditions that should be satisfied by a good feature, we use a probabilistic model 
as the analytical tool for modeling the perceiver's world and the reliability of its feature- 
based inferences. Our choice of a probabilistic model is not a claim that the perceiver 
necessarily has access to the various probability density functions we use in our analysis. 
Whether or not the perceiver itself needs to incorporate such a probabilistic model to 
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distinguish between good and bad features, and whether the world needs to satisfy this 
particular model, are important issues addressed later in the second part of our proposal 
regarding the inference process itself. However, a Bayesian probabilistic formalism allows 
us to state clearly some conditions that a "good feature" should meet, and to explain why 
other, seemingly obvious proposals are inadequate. 

The structure of the model is as follows. The external world consists of different 
classes of objects and events. We refer to each class as a context, C, within which are 
various properties that occur probabilistically. Our canonical property is denoted simply 
by P, and we assume it occurs in context C with the conditional probability p(P\C). We 
denote the absence of property P by notP. Next, we consider that some measurements are 
taken of the objects and events in the world. We refer to a particular collection of such 
measurements as a feature F. Hence a feature will be identified with the set of all world 
events having measurements specified by F, and thus probabilities such as p(F\C) are well 
defined. We wish to study the inference that property P occurs in the world, given both 
that the world context is C and that the measurements F are satisfied. Note that the 
probabilities p(P\C) and p(F\C) are considered to be objective facts about the world (or 
at least an idealization of the world), and are not statements about the perceiver's model 
of the world. In this section we keep the issue of whether or not a perceiver needs to use 
any probabilistic model of the world quite separate from our analysis of a good feature. 

2.1 Reliable Inferences 

In the probabilistic formalism a measure of the success of inferring property P from F is 
the a posteriori probability of P given the feature F in the context C. A reliable inference 
makes this probability, namely p(P\FkC), nearly one, and the probability of an error, 
namely p(notP\FkC), nearly zero. It is convenient to consider the ratio of these two 
quantities, that is 

_ p{P\FkC) 
P° 8t ~ p(notP\FhC) W 

We consider the feature F to provide a reliable inference, in the context C, precisely when 
this probability ratio R post is much larger than one. Below we consider how such a condition 
can be ensured. 

Bayes' rule can be used to break down the probability ratio R p08t into two components. 
The first component, L, is a likelihood ratio and relates to the measurement F of property P. 
The second component is another probability ratio, R prior , and is related to the genericity 
of the world property P in context C. The decomposition of R post has the simple form: 

R post = L • R pr i or . (2) 
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Here the prior probability ratio R pr { or is given by (compare equation (1)) 

p{P\C) 
P rior ~ p{notP\C) ' (3) 

and the likelihood ratio L is defined to be 



p(F\PkC) 
p(F\notPkC) ' 



L = '-■' -- • (4) 



From equation (2) we see that the likelihood ratio L acts as an amplification factor on the 
prior probability ratio R pr { or Thus it makes sense that a good feature F have a large 
amplification factor: 

Measurement Likelihood Condition: In context C, a good feature 
F for world property P provides a large likelihood ratio, that is, 

p(F\PhC) ^ 

~ p{F\notPkC) >> * 



At first blush, a large likelihood value for L seems sufficient to capture the intuition that 
good features should point reliably to some property in the world. However, because L 
appears as a product with R pr j or in equation (2), it is clear that we can not afford to let 
the prior probability ratio R pr i or become too small. That is, we also require 

Genericity Condition: Given a context C and a constant 6 > 0, the 
property P occurs with probability p(P\C) > 8 or, equivalently, 

. _ P(P\C) , 6 (6) 

P rtor p(notP\C) > 1 - 8 > °- 



By "generic" we mean that P occurs with a probability greater than zero within context 
C. The Genericity Condition puts a lower bound of 8 on this probability. Given that L and 
R prior satisf y the likelihood and genericity conditions, it follows from equation (2) that 
R post > LS /(1 ^ s )- Hence, when L » (1 - 8)/8 y the two conditions together ensure a 
reliable inference. 
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Figure 3 Two sticks in 3D form a near-V vertex to create property P, which projects into 
the V-j unction image feature F. The resolution for the sticks forming a V is taken as a 
disc of radius e in the image (assuming orthographic projection) and, for the 3D tolerance, 
the sphere of similar radius. Although the measurement likelihood ratio condition is 
satisfied, the conditional probability of P, given the observation F and a random world 
context, favors notP - i.e. that the endpoints of the two sticks lie at separate locations 
within the cylinder of radius e. 

that case Rp 0S t = S/e 2 >> 1. But this is simply the genericity condition, which requires 
a context in which the 3D "V" structures are fairly common. In other words they are a 
regularity in that context (Bennett et al., 1989; Marr, 1970; Witkin & Tennenbaum, 1983), 
such as if we are in a blocks world where edges form V's, or perhaps another where "victory 
signs" are created by finger arrangements. Once again, then, the context plays a major role 
in the inferences that features support. 



2.3 Informativeness 



By requiring that both the genericity condition be satisfied as well as L >> 1, we now can 
be assured that the feature F in context C will be a reliable predictor of world property 
P. However, a third condition is needed to ensure that the inference of P is actually 
informative. For example, in a context of randomly placed sticks (e.g. C pen) consider a 
world property P such as two skewed sticks. For simplicity we assume an orthographic 
image mapping and let the feature F correspond to two skewed lines in the image. Then 
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Figure 4 A blocks-world example where the non-accidental property "collinear" is 
ignored (see text for discussion). 



Here we have written the conditions using the probability ratios appearing in the Bayesian 
formula (2). The constant 8 should be chosen such that we consider probabilities larger 
than 1 - 8 as virtually certain in order that the information condition rules out features 
that simply confirm virtually certain events. Also, in terms of 8, the genericity condition 
requires that the property P have a probability larger than 8 and thus P is not virtually 
impossible. The particular choice of 8 and a quantitative threshold for L are left open in 
the above proposal. We expect that the choice of these quantities would depend on the 
utility or risk involved in making, or failing to make, the appropriate inferences, which we 
do not pursue here. Finally, note the desirability that the inference can be made reasonably 
often. That is, the context C should not be too rare, and given the generic property P, the 
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measurements F should also be common. This new requirement has been incorporated as 
part of the informativeness condition. 



2.4 Non-monotonicity of Inferences 

We close this section with one final example of the role context plays in our proposal. Most 
people see Figure 4 as depicting three blocks: one block resting on top of another, and a 
third twisted block that lies behind. Note that two of the vertical lines associated with 
the Y-junctions are actually collinear in the image, creating the useful (non-accidental) 
collinear feature suggested by Lowe (1985). This feature certainly satisfies our likelihood 
ratio condition. So why don't we see the two blocks as having collinear edges in 3D with 
one block floating above the other? (A similar example having an accidental view of a "Y" 
vertex, due to Steve Draper, is given by Hinton (1977).) 

To understand the use of collinearity as a feature, we consider inferences appropriate 
for three different contexts. Each of these contexts is simply a statement about regular- 
ities in the scene generating process, and are not meant to imply different stages in the 
perceiver's visual information processing system. The first context is an "open context" , 
Copen, which consists of randomly placed line segments. In particular, collinear, cotermi- 
nating, or parallel lines in the world are non-generic (i.e. probability zero) in this context. 
However, although the likelihood ratios for all these properties are easily seen to be large, 
as was the case for the "V" feature discussed earlier, the a priori probabilities for these 
"non-accidental" properties are too small to warrant their inference. Hence in the context 
Copen the overwhelmingly probable conclusion is that the collinear, coterminating, and 
parallel lines in the image simply arise due to some cause other than being the projections 
of their corresponding 3D properties. (An obvious possibility is measurement noise and a 
special view of the scene.) 

Now consider a second context, C group , similar to the first, but with regularities added 
that make, say, collinear lines or parallel edges much more probable than they would be 
in the unstructured context C open . For example, such a context would result if there 
are processes in the world that cause the 3D line segments or edges to form structures 
having particular regularities such as textured flow fields (Stevens, 1978; Kass h Witkin, 
1988) or blocks with parallel faces (Lowe, 1985). Now the significant prior probability of 
these specific structures in that context and the large likelihood ratio provided by the non- 
accidental feature, together ensure that the inference of the corresponding 3D structure is 
reliable. Given Figure 4 in this context then, and given the alignments and parallel edges, 
one might infer that these image elements arose from a related group of 3D objects (as 
indeed they did!). 

The third context involves a collection of blocks, C block , where the blocks can rest on 
one another or float about freely. If blocks float freely then their position and orientation 
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with respect to the other blocks is assumed to be random, with vanishing a priori prob- 
abilities R pr i or for collinear or parallel edges. So again the situation is analogous to the 
case of the V-junctions presented earlier (Figure 3). Hence, although the likelihood ratio 
L is high in context C^/ oc £, the prior probability that the two blocks would be floating in 
just such a way to make a pair of edges collinear is vanishingly small, and the resultant a 
posteriori probabilities Rp 0S t rule against the interpretation that the two edges happen to 
be collinear. Instead, we favor some other cause, such as an accidental viewpoint. Finally, 
we note in passing that the occluded twisted block in Figure 4 is seen as just that - a single 
block but not as two, although none of the edges are collinear. However, in the context 
Chloch li * s reason able to expect that the implicit axes of the right and left portions of the 
twisted block could be extracted. Such features satisfy a cocircularity regularity (Parent 
& Zucker, 1989), which is also a "non-accidental" property, and hence the "one block" 
inference is justified. 

Our point then is that the context in which the scene configuration arose is crucial 
to the interpretation of a feature, since a change in context can reverse the appropriate 
inference. In our example, the 3D collinearity conclusion is justified only in the middle 
context C g r0U p; in the less structured context C pen and in the most structured context 
^block tne 3D collinear regularity for these lines is not viable. Hence the appropriate 
inference is non-monotonic with the degree of structure or specification within the context 
(McCarthy, 1980; McDermott & Doyle, 1980; Reiter, 1980; Salmon, 1967). 

3.0 Model Classes 

A major point of our analysis of "what makes a good feature" is that supportable infer- 
ences are context-sensitive. Features must be evaluated in terms of generic properties or 
regularities in a specialized context or model class, as contrasted with an open context like 
a "random-world" model. Implicit in this treatment is that the external world indeed has 
some non- arbitrary structure, and that our own internal models can express this structure 
in terms of certain regularities explicitly stated as part of the model. How are these regular- 
ities expressed in the Bayesian formalism, and how can they be mirrored in the perceiver's 
conceptualization of the world? 

In an attempt to capture the notion of a regularity, within a probabilistic represen- 
tational system of a perceiver, Barlow (1985) proposed "good features" should satisfy the 
"suspicious coincidence" condition p(AkB) » p(A)p(B), where A and B are two obser- 
vations. 2 The intent of the condition is to notice special situations that are not expected by 
an independence assumption of the occurrence of A and B. Although "suspicious" implies 
to us that there is a current context, this is not an explicit part of Barlow's proposal, which 



Based on the text, we assume that the intended inequality is as appears here. However, note 
that for the independent event hypothesis, the inequality can be applied in either direction. 
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requires the very controversial computation of estimating context-free probability distribu- 
tion functions (i.e. p(A) = T,p(A\C)p(C) summed over all possible contexts). Barlow (1990) 
discusses at length elsewhere how a neural system might learn the appropriate distribution 
functions (see also Clark & Yuille, 1990). 

One way to capture the intent of Barlow's proposal within the Bayesian framework 
is to consider the feature observation in the context C p where the associated property is 
generic, as contrasted with the current, less specialized context C where the property (or 
properties) are non-generic. More specifically, 

Suspicious Coincidence: The observation of a feature F represents 
a suspicious coincidence in the context C if there is a more specialized 
(i.e. detailed) context C p such that, 

(i) the likelihood ratio involving feature F and property P is large in 
both contexts, and 

(ii) the probability of P in the specialized context C p is much larger 
than in the current context C , that is 
p(P\C p ) » p{P\C ). 



(9) 



For example, in our discussion of the blocks in Figure 4 we first considered the open context 
Copen of random lines. The collinearity feature F has a large likelihood in context C pen, 
but the prior probability of 3D collinear lines is negligible. However, in the grouping 
context Cgroup, the prior probability is significant and the likelihood ratio is still large. 
Hence, we would consider the observation of collinear lines in context C open as a suspicious 
coincidence with respect to the more structured context such as Cgroup- Note that this 
conclusion is not to be considered a reliable inference that context Cgroup actually occurs 
in the world. (An analysis similar to the one presented in Section 2 could derive suitable 
additional conditions to ensure a reliable inference of the new context.) Rather, Barlow's 
notion of suspicious coincidences simply provides an approach for chaining through to more 
detailed contexts as further regularities are uncovered and assimilated. We do not pursue 
this chaining process here, and instead concentrate on how a specific context might be 
represented. 

Clearly an internal model can not be expected to match exactly the behavior of external 
events. In terms of our Bayesian proposal, the internally represented probability density 
functions p(P|C,) can not be identical to their external world counterparts, p(P\C w ), say. 
In particular, as the contexts become more and more specialized (and hence the measures 
on the probability density functions become more and more biased), the world model and 
the perceiver's conceptualizations may diverge. We would like to minimize the effects of 
this divergence. In other words, we seek model contexts, properties, and features that are 
robust under errors in our estimates of the conditional probability measures. This is a 
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Figure 5 Two kinds of regularities, transverse (left) and non- transverse (right). 

lines in 3D is a non-transverse event, but two lines skewed and non-intersecting in 3-space 
would be a transverse arrangement. 

Non-transversality, then, appears at first blush to be the "non-accidental" proposal 
of Lowe (1985). However, here we use the terminology "transversal and non-transversal" 
because these terms are context-sensitive and can be applied to world models with ar- 
bitrary statistical properties. Thus, in a non-random world model, say one describing 
body parts, the arrangement such as the V-vertex which we previously considered non- 
transverse can become transverse (because this is the configuration of an arm). However, 
in this same model class, the T-junction or parallel line configuration would continue to be 
non-transverse. Still another example would be an assumed model context where objects 
are taken to obey two-fold reflectional symmetry. Then a line perpendicular to a plane will 
be a transversal arrangement, whereas in the absence of such a symmetry constraint, such 
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a 90 degree intersection is non- transverse. Hence the notion of transversality also involves 
categorical properties considered special in the current model class. An important type of 
world regularity can be specified by adding on top of this categorical structure an indication 
of whether or not a particular non-transversal category has a non-zero prior probability of 
occuring. 

3.2 Key Features 

Let us define a model space M simply as a manifold constructed by parameterizing some 
modelling domain. The parameters could be involved in descriptions of (3D) position, 
attitude and shape of various parts, or reflectance properties of surfaces, or higher order 
structures such as the sounds of a babbling brook. Also various categories P are represented 
as subsets of the model space, some of which form non-transversal submanifolds within M. 
For example, our two sticks "V" example corresponds to a model space R 10 , where the ten 
parameters describe the position and orientation of the sticks. Consider the category P for 
which the two sticks form a V-junction (for simplicity, with a particular pair of endpoints) . 
This is a 7-dimensional hyperplane in our model space. We note in passing that this 7- 
dimensional space has other "special" configurations within it, such as the 5-dimensional 
hyperplane representing the situations when the two sticks are also collinear. 

Next we need to specify how M and the various categories are meant to represent 
(or "mirror") structure and events in the world. In particular, we assume a fixed mapping 
between events in the world and categories within M. The stick example suffices to illustrate 
the mapping between coterminating sticks in the world, and the representation of this 
event in At. To avoid unnecessary details we simply identify a world property as P w , 
and use P m to refer to the corresponding category within M. Given this correspondence, 
we can take a world context C w (which the reader may assume is simply an index to an 
appropriate probability density function) along with the associated probability distribution 
p(Pw\C w ), and consider the "ideal probability distribution" induced on the model space, 
namely p{P m \C m ) = p(P w \C w ). Of course, this ideal probability measure in NOT to 
be considered part of the perceiver's conceptualization. However, we need to make an 
assumption about its general structure, namely 
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Mode Hypothesis: Given a model space M and a context C w then 
the probability measure p{m\C w ) can be decomposed into the sum 
!Cr=oP»( m |£'«') f° r m e M. Here p is the background measure and p» 
for i > is a measure having support only on the non-transversal cat- 
egory Pi within M . Each of these measures is assumed to have density 
functions of the form 



(10) 



Pi(m\C w ) = m(m)piexp(-Hi(m\C w )),i = 0, . . . ,n 

for m G M (see Skilling, 1991). Here /z is the Lebesgue measure on 
M and p,i for t > are Lebesgue measures on the property spaces 
Pi (i.e. delta distributions). The terms /?» can be taken to be or 
1, depending on whether the i th mode is a regularity in context C w . 
Finally, the remaining terms involving Hi provide a reweighting of the 
uniform Lebesgue measures; they are exponentiated simply to insure 
the weights are positive. 

The Mode Hypothesis can be seen to be a hypothesis about the form of the "ideal" 
probability density, for properties within a model class (Bobick, 1987; Marr, 1970). The 
basic idea is that robust features should supply reliable inferences over a wide range of 
possible choices for the specific background probability density and for the non-transverse 
probability densities. In other words, the robustness of the inferences should follow from 
the structure of the probability density, which in the ideal case will be a collection of 
delta functions. Ideally, all the perceiver needs to maintain is the locations of these delta 
functions, but not knowledge of their probability distributions p(P w \C w ) because typically 
this information will not be available. Instead we take the (perhaps, extreme) position that 
an assumed context, C m , is simply a specification of which categories P t have a non-zero 
probability mass. In terms of equation (10), C m specifies which normalization constants 
Pi are nonzero, but says nothing about the details of the actual density functions in terms 
of the weight functions Hi(m\C w ). Different modes can be selected in different contexts, 
and that is the only control of (assumed) context the perceiver has. For convenience we 
will abuse the notation, and take p(rn\C m ) to mean any one of the set of density functions 
which satisfy equation (10), and is nonzero only on the selected modes specified by the 
model context C m . 

The stick example provides a concrete case, where the world context consisted of two 
randomly placed sticks. The particular probability density p is assumed to be a smooth 
function of both the location and orientation of the two sticks. Such a distribution can be 
written in the form presented for a background measure. Many different choices for H 
are possible, describing for example a uniform distribution within a cube, or a Gaussian 
distribution, etc. The important property of p is that, independent of the choice of H , 
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it assigns zero probability to all non-transversal manifolds such as the P» of M. Suppose 
there are two regularities in this particular world context. One causes the two sticks to 
form a V-junction with a non-zero probability, and the other causes these V-junctions to 
form the degenerate case of collinear sticks. Such a world satisfies the Mode Hypothesis, 
with the V-junctions and the collinear V-junctions forming the only non- transversal sets 
which have positive probability mass. Within this particular context, such regularities will 
support robust inferences from their measurements, even though the (unavailable) density 
functions associated with the perceiver's internal model space C m do not match exactly 
the associated objective density functions in the world, namely p(P w \C w ). 

To support this claim, we now proceed to develop the relation between the special class 
of non-transverse properties Pi E M and their associated features F{. Hence, in addition 
to a model space X, we now require a measurement space I and an imaging mapping, tt, 
from M onto I. (This basic set up is similar to that used in Observer Mechanics (Bennett 
et al., 1989) with the exception that for us the various spaces and mappings are all part of 
the perceiver's representational framework. For Observer Mechanics these entities are the 
world.) Features Fi are identified with subsets or submanifolds within the measurement 
space I. To illustrate this mapping, consider again the two stick case. Then, given ortho- 
graphic imaging, the 10-dimensional configuration space for two sticks will be imaged to 
a 6-dimensional feature space. Within this feature space is the 4-dimensional hyperplane 
(a non-transversal set) consisting of all possible images containing V- intersections. We as- 
sume that the imaging map n correctly models the qualitative structure of the transduction 
and subsequent measurement processes of the perceiver (again, detailed noise models are 
not assumed). Finally, we define the probability of a feature F, say p(F\Pk.C w ) to be 
the probability induced by the image map and the measure on M. That is, p(F\PkC w ) 
is given by the probability of the set of all models m which image to F, namely w~ 1 (F). 
Similarly, given a model context, p(F\P&cC m ) is taken to mean any one of the induced 
measures consistent with the model context C m . 

A model class is defined to be a pair of spaces M } I, along with the imaging map w. 
In addition to these spaces a model class includes two lists of categories, one a list of model 
properties (or categories) P t within M, the other a list of features F { within I. Finally, a 
particular model context C m for a perceiver is simply a selection, from the list of categories 
Pi, of those which are assumed to have a non-zero mass in the "ideal" probability measure. 
Given this framework, we obtain our robust feature: 
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SIMPLIFIED INTERNAL MODEL 

A. OBJECTS in the model space are constructed from Points, 
Lines (Segments) and Planes (Facets). 

B. OBJECT ELEMENTS 



Point 

Line Segment (Bar) 
Edge (of Region) 
Corner (Facet) 




• ••' 



(innately) available to the perceiver. 

1. "Object " Type: point, line, segment, etc. 

2. "Object Relations: parallel, coincident, perpendicular, collinear, 
co-planar (symmetry). 

3. "Special" Property: gravity. 

D. CONTEXT (or model class) 
Variable over contexts. 



Figure 7 The basic ingredients of the observer's internal model. 
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POINT TO LINE SEGMENT 



CQNCEPT 



DEPICTION 



COST CODIMENSION 



COINCmENT (end) 



o, 0, 1 



COLLINEAR (on) 



.••' 



!••' 



(off) 



a,/? 



PERPENDICULAR 



• • 
• • * 



CO-PLANAR 



PARALLEL 



- undefined - 



N/A N/A 



Figure 8 Non-transverse arrangements of a point to a line segment. 
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LINE TO LINE SEGMENT 



CONCEPT 



DEPICTION 



CODIMENSION 
3D 2D 



COINCIDENT 



COLLINEAR 



PERPENDICULAR 



(non-planar) 



■ ^^* ' 



.-*s.V 







(co-planar) 



V 







PARALLEL 




Figure 9 Non-transverse arrangements of one line segment to another, again in a 
"random world" context. 
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LINE AND GRAVITY 
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POINT TO LINE SEGMENT (PLUS GRAVITY) 



CONCEPT 



COINCIDENT 



depiction 
I 
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ID aD 



3 3 



COLLINEAR 
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PERPENDICULAR I 
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COPLANAR 
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T • 
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•hue of end 



taper- 3 
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fee) 



New Concept* : •VERTICAL', 'ABOVF /• BELOW, 'HORIZONTAL' 



Figure 10 The addition of a coordinate frame, such as the gravity vector, expands the 
Key Feature possibilities. 
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Event y 



Event x 



Event z 



Figure 11 Left: A cluster (or perhaps two!) of points whose specialness is difficult to 
demonstrate statistically. Right: A pattern of points that is much simpler to show is 
non-arbitrary, not only because the subspace is more coherent, but especially because 
the arrangement is non- transversal for a simple line-segment model. 

field). And, finally, the measurements on the image will be noisy. Hence, we can expect 
to see distributions of points in the event spaces, not well-marked trajectories. Clearly a 
random cluster of points, such as Figure 11a (left) can not support a key feature, whereas 
Figure lib (right) looks promising. How then do we proceed to test whether the observed 
distribution of points in the event space supports a key feature? Fortunately, a good part 
of the necessary machinery is available, provided that one knows in advance the possible 
model types that apply (Kendall, 1989). But this is indeed the case because all the "low- 
order" types of Key Features have been enumerated. The procedure, then, is simply to 
test the hypothesis that the points in the feature space support one of the Key Feature 
configurations known to the perceiver. 6 

4.1 Data Description 

To illustrate a version of Shape Statistics, consider the configuration in Figure lib. We 
know that the coincidence of three lines is a special configuration of codimension 2 in the 



5 Note that Kendall & Kendall (1980) provide a very detailed analysis of the collinear Key Feature 
applied to the data of Stonehenge in order to test the hypothesis that the alignments marked 
some interesting astronomical event. 
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event space. The task is then to obtain a probability density function (pdf) for each line 
and separately for their intersection. To estimate each line (and hence its trajectory), we 
can create a density function concentrated along a ID curve or spine, following the methods 
of Leclerc (1989) or Hinton et al. (1991). Denote this spine together with its associated 
pdf as a "caterpillar" . An important property of these approaches is that such caterpillars 
provide an appropriate form of description for each "image". In particular, for Figure lib 
we might expect that a process similar to Leclerc's would extract a description in terms of 
three straight caterpillars. Their width would be determined from the scatter of the data 
points perpendicular to the spine. In addition, the endpoints of the linear segments would 
also be provided only to within the same resolution. Similarly, for 11a, the same process 
might be expected to choose a description involving only one or two blobs. 

Given these descriptions it is now clear how to deal with images such as Figure lib. 
Presumably we have recovered precisely three line segments along with an estimate for 
possible errors in the positions of the endpoints. This provides a "stick image" , to which 
we can apply our usual repertoire of Key Feature models (i.e. candidate configurations). 
The only difference is that we have an explicit estimate for the noise variability, so we 
could expect to get more detailed estimates of the basic probabilities and likelihoods in our 
Bayesian proposal. 

It is interesting to note the similarity in our proposal for good model descriptions 
and good features. For example, the "three stick" configuration is a specialization of a de- 
scription including polynomial spines, suggesting that lower dimensional descriptive models 
can be found on particular nontransversal submanifolds in higher dimensional descriptive 
spaces. The observation that an interpretation is close to one of these non-transversal 
sets suggests that we collapse the description to the smaller space. This is analogous to 
observing a non-transversal feature in our model class. 

4.2 Decision Rules 

The extraction of a good description for Figure lib, followed by the inference of a triple 
junction, is clear in principle but it raises some difficult issues. Both Figures 11a and 
lib are fairly clear cut in terms of their structure, with only one model fitting very well 
in either situation. However, consider adding more noise to Figure lib to obtain some 
intermediate cases. Presumably the parse into three separate lines becomes less certain, as 
does the quantitative data on the parameters for the lines. In an abstract feature space the 
picture is of a noise estimate associated with each feature which covers a larger region as 
the input noise is increased. A final point is that, in terms of our Bayesian proposal, the 
likelihood ratio L for observing particular regularity will decrease (basically, by adjusting 
the width of the caterpillar we are keeping p(F\PhC) roughly constant, but this increased 
width will also cause the probability of false targets, p(F\notPkC) to increase). As a 
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result the inferences will become less certain or, once the Informativeness Condition fails, 
uninformative. 

We discussed the problem of choosing a good description of the data in the previous 
section. Given a description we are now faced with choosing an appropriate inference from 
our model class. How can such a decision be made? Simple structural rules, such as choos- 
ing the most singular model (highest codimension) consistent with the data description, or 
the least singular model, can easily be shown to be inappropriate. Similarly, the maximum 
likelihood description will generically be a transversal point in the feature space, and thus 
the regularities will almost never be inferred. Recall that the regularities only support 
strong inferences if their a posteriori probabilities are sufficiently large, and the likelihood 
ratio L for features associated with properties serves as the amplification factor from a 
priori probabilities to a posteriori probability ratios. A decision rule based on maximum a 
posteriori probability (MAP) estimates is possible, given estimates for the prior probabili- 
ties (Clark & Yuille, 1990). However, it is not clear that such useful estimates on the priors 
are possible to simply memorize, especially when we need these priors for each of a wide 
range of contexts. Thus for MAP estimation to work we need to estimate the priors on the 
fly from the model class, with the one glimmer of hope here being that the estimates may 
only need to be accurate to within an order of magnitude, or so. A different approach in- 
volves placing a partial order on various possible interpretations (see Jepson and Richards, 
1991, 1992). This partial order could be made on the basis of probability estimates, or some 
other form of preferance relation. For example, for the blocks in Figure 4 we may estimate 
that a floating collinear interpretation (codimension 4) is significantly less probable than 
an accidental view interpretation (codimension 1 or 2 depending on whether or not the 
blocks are assumed to be right angled), especially since we have no way of explaining this 
codimension 4 event. Difficult research issues remain for the resolution of these problems. 

4.3 Ideal Observers 

Recently, Bennett, Hoffman & Prakash (1989) have constructed a probabilistic framework 
called "Observer Mechanics" which provides an alternative model for both the world and 
the perceiver. The major component of this model is an "observer" which is the 6-tuple 
(X,Y,E,S,n,r)) where (loosely speaking) A" is a configuration space of quantities being 
observed, and Y is the imaging space formed by the many- to-one mapping n : X -+ Y. 
Within X lies a set E of "distinguished configurations" that play the role of our non- 
transversal categories. The images of configurations within E form the set of features S 
observed in Y. Hence S corresponds to our non-transversal image features. Finally, for 
each 8 € 5, »?(«,•) is a probability measure on 7r _1 (s). 

An ideal observer is defined in terms of an unbiased measure /i x on the configuration 
space X. We take this measure to be the probability of a particular configuration in X, 
but in the absence of any structuring influence producing the distinguished configurations 
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captured in E. That is, \x x is analogous to our background probability distribution p . 
Within this framework, an observer is then said to be ideal if 

In other words, when there is no regularity or structure in E, there is a zero probability of 
observing an element of S that does not result from an element of E (i.e. the probability 
of a false target, is zero). In terms of our earlier example, the probability of a "V" image 
feature is just the probability of the set of all configurations in X which project to 5, 
namely /^(tt -1 ^)). In a random stick world this probability is zero, and this implies 
that the previous equation must be satisfied (see the discussion around equation (3.3) in 
Observer Mechanics). Therefore, there exists an "ideal observer" for 3D T's in a random 
stick world. In fact, if we identify the set E with world property P and identify the set 
S = n(E) with image feature F, then F = w(P) using our terminology an ideal observer 
can be constructed precisely when: 

Ideal Observer Proposal: The image feature F is non-generic 
in the absence of world property P, and occurs with probability 
1 in the presence of world property P. 

Besides the condition that F occurs with probability one in the presence of P (which 
may be regarded as a consequence of our definition of F = n{P)), the only condition on an 
ideal observer is that the false target rate must be zero. Hence the measurement likelihood 
ratio must be infinite. Thus ideal observers are similar to our key features, in that both 
require an infinite likelihood ratio L. However, unlike key features, ideal observers include 
situations such as the "V" observer in a random stick world, even in the absence of a 
world regularity for "V" 's. In addition, ideal observers include the case of two randomly 
placed sticks, where the world property P is simply the occurence of non-parallel sticks. 
This property occurs with probability one, yet there is still a feature having an infinite 
likelihood ratio. In our Bayesian proposal we include conditions that eliminate cases such 
as these. In particular, the V-observer is eliminated by the requirement that the world 
property is generic, and the skewed-sticks observer is eliminated by the informativeness 
condition. 

Observer mechanics recognizes this problem but deals with these degenerate cases 
in a rather different manner. Both the V-observer and the skewed-sticks observer are 
essentially "no-op" observers. The V-observer in a random stick world detects a feature 
with probability zero, so it never reports a V observation. On the other hand, the skewed- 
stick observer detects its feature with probability one, and always responds. In both cases, 
the performance has zero probability of being wrong, which justifies the term "ideal" . The 
conclusions of these "no-op" observers can reliably be used as input to other observers, 
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and that is the primary requirement on an ideal observer. The problem we posed in this 
paper is different, we actually want useful, robust, and informative features. As a result, 
our definition of a key feature is (roughly) a subset of the situations for which there is an 
ideal observer, and to specify this subset we require structure both in the regularities of 
the world and in the conceptualization of the perceiver. 

A second difference between our formulation and observer theory is that given a fea- 
ture, we attempt to make categorical statements about world properties within a model 
context, whereas observer theory strives to place probability measures on world properties 
that are supported by observing a particular feature. Given a feature s, the conclusion of 
the observer is provided by a probability measure r}(s,e), with e in the distinguished space 
E (corresponding to P). This measure ry(s, •) is called the interpretation kernel. In our 
framework this distribution is the a posteriori probability distribution p(m\FkP), condi- 
tional on both the feature F and the property P. For example, given the skewed-stick 
observer, the interpretation kernel would provide the a posteriori probability for the 3D 
position and orientation of the two sticks. In contrast, our approach provides only the 
categorical response that the two sticks are indeed skewed in 3D. The computation of such 
a interpretation kernel clearly involves detailed a priori probability distributions, which 
we have attempted to avoid. However we note that, in situations where the priors can 
be computed, the incorporation of analogs to the interpretation kernal could play a role 
in extending our "categorical" good feature formulation. For our purposes in this paper, 
we only point out that the most plausible approaches for the computation of these priors 
involve the manipulation of assumed regularities in the world, which again ties in with our 
notion of a model class. 

5.0 Examples 

Our treatment of Key Features within a feature space has been limited to configurations 
built from points, lines, edges, and facets. Although we have tried to stress that these 
elemental object types are not the only primitives that one might use, it is easy to regard 
our treatment as applying only to a "blocks world". The essential point, however, is that it 
really doesn't matter what sensory attributes or dimensions we consider, nor the particular 
object types chosen as "observable" primitives in that space of features. For example, we 
could explore non-transverse configurations in time rather than space, or frequency-time as 
in an acoustic feature space (Bregman, 1990). Here, however, we will present three further 
examples taken from vision. 
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Figure 12 A Key Feature for the translation direction for ego motion has the same type 
of non-transversal configuration as that for finding the spectral quality of the illuminant! 



addition, the power of the key feature might be further augmented if we also have extrinsic 
frame vectors that act like the gravity vector in Figure 12, such as those derived from 
vestibular inputs. This space housing the key feature for Ego Motion is thus much like that 
shown earlier in Figure 11 (right) where events in the feature space lie on loci that radiated 
from a single vertex. Here, then, we have a specific instance where noise and resolution 
will affect the robustness of the key feature. 
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D (diffuse) 



S (specular) 




D (diffuse) 



Figure IS Top: Representation in the (R, G, B) space of responses L\ and L2 to 
two surface patches, lit by the same source S, that have different diffuse components of 
reflectance (D± and D2). The two planes described by L\ and L2 intersect along the axis 
S, which describes the chromaticity of the illuminant, because the specular component 
of reflectance is common to both objects. The responses from two or more objects that 
define distinct planes can thus be used to find the axis S that describes the chromaticity 
of the illuminant. Bottom: Projection of L\ and L2 onto the chromaticity plane rg — yb. 
The lines described by the responses intersect at the point S marking the chromaticity of 
the illuminant. If the perceiver's model incorporates the knowledge that most daylight 
illuminants lie on a segment of the yb axis, as indicated, then two patches suffice to define 
a "crow's foot" key feature configuration. (Adapted from D'Zmura & Lennie, 1986.) 
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spaces and their projections are quite consistent with our proposal, and would appear to be 
physiologically plausible. However, note that in such mappings that mirror particular "real 
world" properties, the co-dimension of a key feature becomes ambiguous and, as mentioned 
earlier, it is the inferred property that is assigned the codimension associated with the 
particular key feature configuration observed internally. 

6.0 Summary 

Previously, others such as Binford (1981), Lowe (1985), and Witkin and Tennenbaum 
(1983), have noted that good features should reflect "non- accidental" configurations that 
are specially informative yet typical of the world (such as two parallel lines). However, we 
note that the intuitively robust character of an inference based on a non- accidental feature is 
not simply due to the fact that they have a large likelihood ratio (i.e. the feature is expected 
when the world property is present, but very rare in the absence of the property). In the 
discussion of our Bayesian Proposal we have shown that a large likelihood ratio is clearly 
not sufficient to ensure robust inferences (see also Knill & Kersten, 1991). Rather, the 
likelihood ratio simply serves as a lever for raising the o priori probability of the particular 
world property. Given too low an a priori probability this lever is insufficient to provide 
a high a posteriori probability and hence a robust inference. This notion of a reasonably 
large prior probability is implicit in the discussion of a non-accidental feature, and explicit 
in the presentation of the intuition behind Observer Theory, yet the full impact it has on 
the definition of a good feature was not made explicit. 

The analysis of the two block example in Figure 4 shows that the definition of a good 
feature must include a specification of the cognitive context in which it is being used. 
The collinearity feature, a classic non-accidental feature, is reliable in some contexts but 
nonsense in others. The difference hinges on what the perceiver is willing to assume are 
regularities in the world. Thus good features are necessarily bound to the current con- 
text of analysis, to conceptual models, and to the regularities that a perceiver expects to 
be operative (MacKay, 1978, 1985). The fact that a feature can be good in one context, 
but nonsense in a more specialized context, reflects a common phenomena in inductive 
inference known as non-monotinicity (Salmon, 1967). Whether your bias is for perceivers 
who maintain a detailed probabilistic model of their world, or for those which use a log- 
ical framework, this non-monotonic behaviour must be dealt with by the explicit use of 
contextual information (McDermott & Doyle, 1980; Reiter, 1980). 

Given that the specification of "good features" requires the specification of the current 
context, we suggest a model class as an appropriate form for representing contextual infor- 
mation. Basically a model class is an abstract space of models about the world, which has 
been carved up into various categories. Some of the categories are transversal, representing 
open subsets of the space. Other categories exist on subsets (submanifolds) of the param- 
eter space and have a smaller dimension than that of the embedding space. These latter 
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categories are non-transversal, and their degree of specialization can be roughly measured 
by their codimension, that is, the difference in dimension between the embedding space and 
the particular category. In addition, the model space can be projected to the image, where 
a similar categorization in terms of transversal and non-transversal image features can be 
made. Our canonical example is of a non-accidental property or feature such as collinear 
lines, which is non-transversal in both the world and image spaces. Indeed we pursue our 
proposal in some detail for such geometric features, but we also show it has applications to 
other domains such as motion or colour interpretation. 

So far this conceptualization is independent of whether or not certain categories sup- 
port robust inferences in that it does not specify whether any non-transversal category 
reflects a regularity in our world. There is no notion of probabilities in this categoriza- 
tion. To fully specify a model class we need to select particular categories as corresponding 
to regularities that are considered possible within the current context, thus entertaining 
Bayesian-like propositions (Pearl, 1990). However, we prefer to keep the categorical concep- 
tualization itself independent of the notion of regularities, or of probabilities in the world, 
to allow for the same set of categories to be used in a host of different contexts. Given 
the regularities, a Key Feature supports the inference of a particular non-transversal but 
generic world category (i.e. one expected or selected by the pereiver). Hence such a feature 
carries within itself its appropriate interpretation, in that the regularity has already been 
specified in the world, and this step of the inference process becomes rather trivial. Finally, 
given the appropriate qualifications provided by the Bayesian Proposal, such a key feature 
can be expected to provide a reliable inference for that particular regularity in the world. 

For a structured, non-arbitrary world and for a defined set of (internal) concepts 
about primitive object types and their possible relations, the set of Key Features can be 
enumerated. All such features are not equally powerful with respect to their inference 
strength. As a measure of this power, we suggest the codimension of the Key Feature 
configuration, with respect to the class of models computable in the feature space. Our 
proposal requires a slightly different view of "feature detectors" than that customarily 
taken. Rather than simply providing a "measurement" as an oriented bar mask might 
do, our "feature detector" recognizes a non-transverse configuration in an event space 
constructed from such measurements. The class of configurations recognizable are only 
those non-transverse arrangements that can be computed for the types of object primitives 
and relations specified. The principal task, then, is to discover the object types used to 
construct the event spaces, for these will generate the model classes. We suspect that 
the relations computed within the different event spaces will be similar, and relatively 
trivial. Their reliability, of course, will depend upon how well the conceptual relations 
and primitives match the actual building blocks and constraints imposed by Nature on 
constructions in the real world. 
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